Sensory Marketing & Product Innovation

Exercise - Hearing test

Author
Affiliation
Published

Friday, March 21, 2025


Follow me on ResearchGate

Connect with me on Open Science Framework | Contact me via LinkedIn

It might be necessary to use right-click -> open in a new browser window depending on your machine.


R analysis script presenting part of the solutions for exercise on the hearing test (see last exercise).

The purpose of this script is not solely to answer the exercise question. Moreover, studying these scripts should help students become familiar with some aspects of working in R.

If this grabs your attention

If this exercise captures your interest, please take a look at our master study programs at Otto von Guericke University in Magdeburg (Germany), by clicking on the logo:

Faculty of Economics and Management at Otto-von-Guericke-University Magdeburg

1 Loading packages

Beware!

R is a context-sensitive language. Thus, ‘data’ will not be interpreted the same way as ‘Data’.

In R most functionality is provided by additional packages.
Most of the packages are well-documented; see: https://cran.r-project.org/

  1. The code chunk below first evaluates if the package pacman (Rinker & Kurkiewicz, 2018) has already been installed on your machine. If yes, the corresponding package will be loaded. If not, R will install the package.

  2. Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).

  3. The second line then loads the package pacman.

  4. The third line uses the function p_load() from the pacman package to install (if necessary) and loads all packages that we provide as arguments.

if (!"pacman" %in% rownames(installed.packages())) install.packages("pacman")

library(pacman)

pacman::p_load(haven, hrbrthemes, kableExtra, knitr, labelled, modelsummary, openxlsx, tidyverse)

In all code chunks throughout this script, you can get additional help on each function used by clicking on its name (or by right-clicking and then opening it in a new browser tab). Alternatively, you can see which arguments a function accepts while coding by pressing ‘F1’ with the cursor positioned over the function’s name.


Here is the R session info, which gives you information on my machine, all loaded packages, and their version:

R version 4.4.3 (2025-02-28 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] openxlsx_4.2.8     quarto_1.4.4       ggpubr_0.6.0       lubridate_1.9.4   
 [5] forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2       
 [9] readr_2.1.5        tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.1     
[13] tidyverse_2.0.0    modelsummary_2.2.0 labelled_2.14.0    knitr_1.49        
[17] kableExtra_1.4.0   hrbrthemes_0.8.7   haven_2.5.4        downloadthis_0.4.1
[21] pacman_0.5.1      

loaded via a namespace (and not attached):
 [1] gtable_0.3.6            xfun_0.49               htmlwidgets_1.6.4      
 [4] processx_3.8.5          insight_1.0.1           rstatix_0.7.2          
 [7] tzdb_0.4.0              ps_1.8.1                vctrs_0.6.5            
[10] tools_4.4.3             generics_0.1.3          pkgconfig_2.0.3        
[13] data.table_1.16.4       lifecycle_1.0.4         compiler_4.4.3         
[16] munsell_0.5.1           carData_3.0-5           fontquiver_0.2.1       
[19] fontLiberation_0.1.0    htmltools_0.5.8.1       yaml_2.3.10            
[22] Formula_1.2-5           Rttf2pt1_1.3.12         later_1.4.1            
[25] pillar_1.10.1           car_3.1-3               extrafontdb_1.0        
[28] abind_1.4-8             fontBitstreamVera_0.1.1 zip_2.3.1              
[31] tidyselect_1.2.1        digest_0.6.37           stringi_1.8.4          
[34] extrafont_0.19          fastmap_1.2.0           grid_4.4.3             
[37] colorspace_2.1-1        cli_3.6.2               magrittr_2.0.3         
[40] broom_1.0.7             withr_3.0.2             gdtools_0.4.1          
[43] scales_1.3.0            backports_1.5.0         timechange_0.3.0       
[46] rmarkdown_2.29          ggsignif_0.6.4          hms_1.1.3              
[49] evaluate_1.0.3          viridisLite_0.4.2       rlang_1.1.4            
[52] Rcpp_1.0.14             glue_1.8.0              xml2_1.3.6             
[55] svglite_2.1.3           rstudioapi_0.17.1       jsonlite_1.8.9         
[58] R6_2.5.1                tables_0.9.31           systemfonts_1.2.1      

2 Importing the data

For your master’s thesis, it might be a good idea to replicate the IAT findings of Study 1 (D-score > 0, t-test). Based on a power analysis (Assumptions: 𝛼=5%, Power=0.8), how many subjects should you recruit for this study?

The data set is stored as an Microsoft Excel file (.xlsx) within the following directory: https://pub.ww.ovgu.de/lichters/smpi/data/hearing_test.xlsx

The data set’s name is hearing_test.xlsx.

We download the file to our current R project into an object named d. for this purpose, we draw on the function read.xlsx() from the package openxlsx (Schauberger & Walker, 2023).

Below, we tell the function to read the first sheet of the Excel file, check the names of the columns, and separate the non-standard names by an underscore.

d <- read.xlsx("https://pub.ww.ovgu.de/lichters/smpi/data/hearing_test.xlsx", 
             sheet = 1, 
             check.names = TRUE, 
             sep.names = "_") %>% as_tibble()

If anything does not work out as expected on your machine, you can download the data by clicking on the following button.

Download the Excel file manually


3 Data inspection

Let’s take a look at the first few rows of the data set to get an idea of its structure.

d
# A tibble: 27 × 4
   Gender   Age Frequency  Year
   <chr>  <dbl>     <dbl> <dbl>
 1 f         21     16000  2022
 2 f         23     16400  2022
 3 m         25     15850  2022
 4 m         26     16302  2022
 5 f         21     16000  2022
 6 f         22     18000  2022
 7 f         28     18000  2022
 8 f         26     13711  2022
 9 f         25     14287  2022
10 f         24     16222  2022
# ℹ 17 more rows

4 Using linear regression to explain the highest perceivable frequency by age

model_age <- lm(d$Frequency ~ d$Age)

summary(model_age)

Call:
lm(formula = d$Frequency ~ d$Age)

Residuals:
     Min       1Q   Median       3Q      Max 
-1867.49 -1003.37   -13.47   583.04  2991.46 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22987.83    1315.29  17.477 1.58e-15 ***
d$Age        -284.97      50.74  -5.617 7.63e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1204 on 25 degrees of freedom
Multiple R-squared:  0.5579,    Adjusted R-squared:  0.5402 
F-statistic: 31.55 on 1 and 25 DF,  p-value: 7.629e-06

5 Is capability to hear high frequencies a function of the interaction between age and gender?

We can first visualize the relationship between the highest perceivable frequency and age for both sexes in isolation.

ggplot(d, aes(x = Age, y = Frequency, color = Gender)) +
  geom_smooth(method = "lm", fill = "#AEC6D2", se = TRUE) +
  geom_point(alpha = 0.8, shape = 21, size = 2, stroke = 1) +
  theme_minimal() +
  annotate(geom = "text", x = 30, y = 17000, label = paste(
    "Model across both groups : ", "\n\n",
    round(model_age$coefficients[1], digits = 2),
    " - ",
    abs(round(model_age$coefficients[2], digits = 2)),
    "*years", "\n\n", "R²(adjusted) = ",
    round(summary(model_age)$adj.r.squared, digits = 2)
  ), hjust = "left") +
  theme(
    plot.background = element_rect(fill = "#F2F6F8"),
    panel.background = element_rect(fill = "grey98"),
    panel.border = element_rect(colour = "black", fill = NA, size = 0.7),
    text = element_text(size = 12),
    axis.title = element_text(size = 13)
  )

Next, let’s try a regression that includes the interaction and main effects of age and gender. The question is whether the slopes for both sexes are different enough to suggest that they likely do not come from the same data-generating process.

model_moderation_regression <- lm(d$Frequency ~ d$Age*d$Gender)

summary(model_moderation_regression)

Call:
lm(formula = d$Frequency ~ d$Age * d$Gender)

Residuals:
    Min      1Q  Median      3Q     Max 
-2059.3  -495.6   199.6   553.2  2459.8 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      18761.7     3124.9   6.004 4.02e-06 ***
d$Age             -115.1      129.4  -0.889   0.3831    
d$Genderm         6422.2     3593.1   1.787   0.0871 .  
d$Age:d$Genderm   -237.4      143.0  -1.660   0.1105    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1164 on 23 degrees of freedom
Multiple R-squared:  0.6194,    Adjusted R-squared:  0.5698 
F-statistic: 12.48 on 3 and 23 DF,  p-value: 4.763e-05

Alternatively, more insights can be obtained by using a step-wise regression approach. We use the modelsummary package (Arel-Bundock, 2022) to focus on the model differences, which provides a function with the same name.

Below, we establish a regression model with only participants’ gender and age as predictors. We then, apply the modelsummary function to compare all three models.

model_age_gender <- lm(d$Frequency ~ d$Age + d$Gender)

models <- list(
  "Model 1" = model_age,
  "Model 1" = model_age_gender,
  "Model 3" = model_moderation_regression
)
modelsummary(models,
  stars = FALSE,
  statistic = c("std.error", "p.value"),
  shape = term ~ model + statistic,
  fmt = 3,
  coef_rename = TRUE,
  align = "lrrrrrrrrr",
  gof_omit = "AIC|RMSE|Log",
  notes = c("Notes: Dependent variable is highest perceivable frequency in Hertz, independent varibles are: age in years, gender (female vs. male), and their interaction. The upper table area shows unstandardized model coefficients [Est.], non-robust standard errors [S.E.] and p-values [p].")
)
Model 1 Model 1 Model 3
Est. S.E. p Est. S.E. p Est. S.E. p
Notes: Dependent variable is highest perceivable frequency in Hertz, independent varibles are: age in years, gender (female vs. male), and their interaction. The upper table area shows unstandardized model coefficients [Est.], non-robust standard errors [S.E.] and p-values [p].
(Intercept) 22987.835 1315.289 <0.001 23438.299 1401.129 <0.001 18761.693 3124.925 <0.001
d$Age -284.975 50.737 <0.001 -309.463 57.034 <0.001 -115.054 129.403 0.383
d$Genderm 523.309 552.349 0.353 6422.151 3593.084 0.087
d$Age × d$Genderm -237.378 142.990 0.110
Num.Obs. 27 27 27
R2 0.558 0.574 0.619
R2 Adj. 0.540 0.538 0.570
BIC 467.5 469.8 470.0
F 31.547 16.158 12.479

Finally, we use a more prominent method to assess the moderating role of gender in the relationship between age and the highest perceivable frequency. We use the PROCESS macro (Hayes, 2022) to estimate the interaction effect, drawing on bootstrap samples.

Below, we load the macro from the internet.

source("https://pub.ww.ovgu.de/lichters/smpi/data/process.R")

********************* PROCESS for R Version 4.3.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
 
PROCESS is now ready for use.
Copyright 2020-2023 by Andrew F. Hayes ALL RIGHTS RESERVED
Workshop schedule at http://haskayne.ucalgary.ca/CCRAM
 

If anything does not work out as expected on your machine, you can download the data by clicking on the following button.

If you have downloaded manually, you need to execute source(process.R).


Below, we need to explicitly convert the variable gender into a factor and then into a labelled variable. This is because the PROCESS macro requires the moderator to be a numeric variable.

Subsequently, we run the PROCESS macro. The function requires the data set (data), the dependent variable (y), the independent variable (w), the moderator (w), the model number (which is 1), the centering method, the number of bootstrap samples (boot), and the confidence level (conf). We further set the seed to ensure reproducibility. In addition, we set the progress bar to FALSE.

d$Gender <- d$Gender %>% to_factor() %>% to_labelled()


process(
  data = d,
  y = "Frequency",
  x = "Age",
  w = "Gender",
  model = 1,
  center = 2, jn = 1,
  moments = 1, modelbt = 1,
  boot = 10000, seed = 54545, progress = F,
  conf = 0.95
)

********************* PROCESS for R Version 4.3.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                 
Model : 1        
    Y : Frequency
    X : Age      
    W : Gender   

Sample size: 27

Custom seed: 54545


*********************************************************************** 
Outcome Variable: Frequency

Model Summary: 
          R      R-sq          MSE         F       df1       df2         p
     0.7870    0.6194 1355418.7784   12.4787    3.0000   23.0000    0.0000

Model: 
              coeff        se         t         p       LLCI       ULCI
constant 15461.0587  791.7364   19.5280    0.0000 13823.1577 17098.9597
Age        122.3236  265.8595    0.4601    0.6498  -427.6721   672.3193
Gender     364.6212  541.6891    0.6731    0.5076  -755.9955  1485.2380
Int_1     -237.3778  142.9900   -1.6601    0.1105  -533.1877    58.4320

Product terms key:
Int_1  :  Age  x  Gender      

Test(s) of highest order unconditional interaction(s):
      R2-chng         F       df1       df2         p
X*W    0.0456    2.7559    1.0000   23.0000    0.1105

*********************************************************************** 
Bootstrapping in progress. Please wait.

********** BOOTSTRAP RESULTS FOR REGRESSION MODEL PARAMETERS **********

Outcome variable: Frequency

              Coeff   BootMean    BootSE   BootLLCI   BootULCI
constant 15461.0587 15348.5690 1003.2394 13399.9685 17287.5010
Age        122.3236    85.3801  351.3782  -645.5762   705.5247
Gender     364.6212   432.9738  668.9488  -856.6951  1640.8268
Int_1     -237.3778  -221.5327  212.1079  -584.1524   222.7070

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 10000
 
NOTE: Confidence level restricted to between 50 and 99.9999%. 95% confidence is provided in output. 
 
NOTE: The following variables were mean centered prior to analysis: 
         Age
 
NOTE: Due to estimation problems, some bootstrap samples had to be replaced. 
      The number of times this happened was: 7

References

Arel-Bundock, V. (2022). Modelsummary: Data and model summaries in r. 103. doi: 10.18637/jss.v103.i01
Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (3rd ed.). New York; London: The Guilford Press.
Rinker, T. W., & Kurkiewicz, D. (2018). Pacman: Package management for r. Retrieved from http://github.com/trinker/pacman
Schauberger, P., & Walker, A. (2023). Openxlsx: Read, write and edit xlsx files. Retrieved from https://CRAN.R-project.org/package=openxlsx

Reuse

CC BY-NC-ND 4.0, see https://creativecommons.org/licenses/by-nc-nd/4.0/

Copyright

Citation

BibTeX citation:
@misc{lichters2025,
  author = {Lichters, Marcel},
  title = {Sensory {Marketing} \& {Product} {Innovation:} {Exercise} -
    {Hearing} Test},
  date = {2025-03-21},
  url = {https://pub.ww.ovgu.de/lichters/smpi/exercise/Exercise_Hearing_test.html},
  langid = {en}
}
For attribution, please cite this work as:
Lichters, M. (2025, March 21). Sensory Marketing & Product Innovation: Exercise - Hearing test. Retrieved from https://pub.ww.ovgu.de/lichters/smpi/exercise/Exercise_Hearing_test.html